253 research outputs found
Levenshtein distances fail to identify language relationships accurately
The Levenshtein distance is a simple distance metric derived from the number of edit operations needed to transform one string into another. This metric has received recent attention as a means of automatically classifying languages into genealogical subgroups. In this article I test the performance of the Levenshtein distance for classifying languages by subsampling three language subsets from a large database of Austronesian languages. Comparing the classification proposed by the Levenshtein distance to that of the comparative method shows that the Levenshtein classification is correct only 40% of the time. Standardizing the orthography increases the performance, but only to a maximum of 65% accuracy within language subgroups. The accuracy of the Levenshtein classification decreases rapidly with phylogenetic distance, failing to discriminate homology and chance similarity across distantly related languages. This poor performance suggests the need for more linguistically nuanced methods for automated language classification tasks
Basic vocabulary and Bayesian phylolinguistics
Donohue et al.’s critique of our work on the origins and spread of the Austronesian language family is marred by misunderstandings of our approach.
We respond to these by noting that our Bayesian phylogenetic approach: (1)
distinguishes between retentions and innovations probabilistically, (2) focuses
on basic vocabulary not ‘the lexicon’, (3) eliminates known loanwords, (4) produces
results that are congruent with the results of the comparative method and conflict with the scenarios requiring unprecedented amounts of language shift postulated by Donohue et al
A lexicostatistical study of the Khasian languages: Khasi, Pnar, Lyngngam, and War
This paper presents the results of lexicostatistical, glottochronological, and
Bayesian phylogenetic analyses of a 200 word data set for Standard Khasi, Lyngngam, Pnar and War. Very few works have appeared on the subject of the internal classification of the Khasian branch of Austroasiatic, leaving the existing reference literature disappointingly incomplete. The present analysis supports both the strong identity of Khasian as a unitary branch, with an
internally nested branching structure that fits neatly with known historical,
geographical and linguistic facts. Additionally, lexically based dating methods
suggest that the internal diversification of Khasian began roughly between 1500 and 2000 years ago.Copyright Information: Copyright for this paper vested in the authors. Released under Creative Commons Attribution Licens
Why do religious cultures evolve slowly? The cultural evolution of cooperative calling and the historical study of religions
Collective representations are the result of an immense cooperation, which stretches out not only into space but into time as well; to make them, a multitude of minds have associated, united and combined their ideas and sentiments: for them, long generations have accumulated their experience and their knowledge. A special intellectual activity is therefore concentrated in them, which is infinitely richer and complexer than that of the individual.
(Émile Durkheim, Elementary Forms of the Religious Life, [1912] 1965: 29)The languages and folkways of ancient peoples hold little relevance for us, except in one respect: the religions of the ancient world remain our religions. Though religions change, core features of the scriptures and rituals of the world's most popular religious traditions appear to have been conserved with remarkably high fidelity. We explain slow religious change from how religion facilitates cooperation at large social scales. At the end, we clarify how historians of religion, in collaboration with psychologists and computational biologists, might test and improve explanations such as ours.This research was supported by the John F. Templeton Foundation (Testing the Functional Roles of Religion in Human Society, no. 28745), the Royal Society of New Zealand ("e Cultural Evolution of Religion, no. 11-UOA-23
Population structure and cultural geography of a folktale in Europe.
Despite a burgeoning science of cultural evolution, relatively little work has
focused on the population structure of human cultural variation. By contrast, studies in human population genetics use a suite of tools to quantify and analyse spatial and temporal patterns of genetic variation within and between populations. Human genetic diversity can be explained largely as
a result of migration and drift giving rise to gradual genetic clines, together with some discontinuities arising from geographical and cultural barriers to
gene flow. Here, we adapt theory and methods from population genetics to quantify the influence of geography and ethnolinguistic boundaries on the distribution of 700 variants of a folktale in 31 European ethnolinguistic populations. We find that geographical distance and ethnolinguistic
affiliation exert significant independent effects on folktale diversity and that variation between populations supports a clustering concordant with European geography. This pattern of geographical clines and clusters parallels the pattern of human genetic diversity in Europe, although the effects of geographical distance and ethnolinguistic boundaries are stronger for folktales than genes. Our findings highlight the importance of geography and population boundaries in models of human cultural variation and point to key similarities and differences between evolutionary processes operating on human genes and culture
Scintillation in the Circinus Galaxy water megamasers
We present observations of the 22 GHz water vapor megamasers in the Circinus
galaxy made with the Tidbinbilla 70m telescope. These observations confirm the
rapid variability seen earlier by Greenhill et al (1997). We show that this
rapid variability can be explained by interstellar scintillation, based on what
is now known of the interstellar scintillation seen in a significant number of
flat spectrum AGN. The observed variability cannot be fully described by a
simple model of either weak or diffractive scintillation.Comment: 10 pages, 5 figures. AJ accepte
How Accurate and Robust Are the Phylogenetic Estimates of Austronesian Language Relationships?
We recently used computational phylogenetic methods on lexical data to test between two scenarios for the peopling of the Pacific. Our analyses of lexical data supported a pulse-pause scenario of Pacific settlement in which the Austronesian speakers originated in Taiwan around 5,200 years ago and rapidly spread through the Pacific in a series of expansion pulses and settlement pauses. We claimed that there was high congruence between traditional language subgroups and those observed in the language phylogenies, and that the estimated age of the Austronesian expansion at 5,200 years ago was consistent with the archaeological evidence. However, the congruence between the language phylogenies and the evidence from historical linguistics was not quantitatively assessed using tree comparison metrics. The robustness of the divergence time estimates to different calibration points was also not investigated exhaustively. Here we address these limitations by using a systematic tree comparison metric to calculate the similarity between the Bayesian phylogenetic trees and the subgroups proposed by historical linguistics, and by re-estimating the age of the Austronesian expansion using only the most robust calibrations. The results show that the Austronesian language phylogenies are highly congruent with the traditional subgroupings, and the date estimates are robust even when calculated using a restricted set of historical calibrations
CLICS² An Improved Database of Cross-Linguistic Colexifications : Assembling Lexical Data with the Help of Cross-Linguistic Data Formats
International audienceThe Database of Cross-Linguistic Colexifications (CLICS), has established a computer-assisted framework for the interactive representation of cross-linguistic colexification patterns. In its current form, it has proven to be a useful tool for various kinds of investigation into cross-linguistic semantic associations , ranging from studies on semantic change, patterns of conceptualization, and linguistic pale-ontology. But CLICS has also been criticized for obvious shortcomings, ranging from the underlying dataset, which still contains many errors, up to the limits of cross-linguistic colexification studies in general. Building on recent standardization efforts reflected in the Cross-Linguistic Data Formats initiative (CLDF) and novel approaches for fast, efficient, and reliable data aggregation, we have created a new database for cross-linguistic colexifications, which not only supersedes the original CLICS database in terms of coverage but also offers a much more principled procedure for the creation, curation and aggregation of datasets. The paper presents the new database and discusses its major features
- …